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2 ■ Abstract 

Energy landscape theory describes how a full-length protein can attain its native fold by sam- 
P-t, 

"^s \ pling only a tiny fraction of all possible structures. Although protein folding is now understood to 

be concomitant with synthesis on the ribosome, there have been few attempts to modify energy 



landscape theory by accounting for cotranslational folding. Here we provide a model for cotransla- 
tional folding that leads to a natural definition of a nested energy landscape. By applying concepts 



drawn from submanifold differential geometry, the physics of protein folding on the ribosome can 



be explored in a quantitative manner and conditions on the nested energy landscapes for a good 
cotranslational folder are derived. 

PACS numbers: 87.15.-v; 02.40.-k 



A fundamental problem in molecular biology is explaining how the three-dimensional 
structure of a protein is encoded within its amino acid sequence. Inside the cell, proteins are 
synthesised on the ribosome by sequential addition of residues to an elongating polypeptide 
chain during a process called translation |l|. Translation accounts for the conversion of 
genetic information to the primary sequence of a protein, but knowledge of how the molecule 
then folds into a functional state is central to our understanding of the natural world. Energy 
landscape theory provides a mechanism whereby the existence of intermediate structures, 
each associated with a free energy cost, enable the folding pathway of a protein to be mapped 
on a multidimensional potential energy landscape. Assuming the global shape of the energy 
landscape for a good folder resembles a funnel means that only a small fraction of all possible 
structures need to be sampled before the protein attains its native fold 

Nascent proteins can begin to fold whilst they are still bound to a translating ribosome 
[5|, |6j. During cotranslational folding, the conformational space available to a protein in- 
creases incrementally with addition of residues to the polypeptide chain. This can enhance 
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9||, and allow access to 
,lll|. Many stud- 
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folding yields [7j, provide an additional level of quality control [8|, 
folding pathways different from those available to a full-length protein 
ies (e.g. |l2|-|l4J|) have highlighted a relationship between folding timescales and the delay 
before amino acid addition (ta), which can be modulated by codon usage and controlled by 
the translational apparatus. Some research groups have also developed a theoretical under- 
standing of how protein folding is affected by varying translation rates ta 15|, ll6| , but so far 
there has not been a satisfactory attempt to modify energy landscape theory by accounting 
for cotranslational folding. 

Taking a geometric approach this problem could reveal a relationship between ta and 
curvatures of the energy landscape. Moreover, it should account recursively for the properties 
of the energy landscape at different chain lengths. In the past, pseudo-Riemannian geometry 
has been successfully applied to folding of full-length proteins [17], |l8|, but the questions 
we wish to address here lead to the development of a quite different theory specific to 
cotranslational folding. The result is an analytical set of conditions that must be satisfied 
by a group of nested energy landscapes for any protein folding cotranslationally on the 
ribosome. 

We begin our treatment as in [lTJ, |l8|, by considering an enlarged (N + 2)-dimensional 
configuration space with coordinates q°, q 1 , ..., q N , q N+1 , where q 1 , ..., q N are the Lagrangian 



coordinates of a polypeptide of length n. Denote a potential energy 



unction on this space 

by V n = V n (q l , ..., q N ). In keeping with the assumptions of 15|, |l6j, the transition from 
nascent chain length n to (n + 1) is instantaneous relative to the times that the ribosome 
spends at either of these chain lengths. Consequently, addition of a new amino acid to the 
polypeptide results in a discrete jump of value V n+ i — V n in the potential energy associated 
with any given conformation of the nascent chain. 

The configuration spaces of the nth and (n — l)th states can be represented as manifolds 
M and M respectively. We endow the manifold M with an Eisenhart metric [19J] whose arc 
length is 

ds 2 = 5 lJ dq i dq j - 2V n {dq ) 2 + 2dq°dq N+1 , (1) 

where the indices i,j run from 1 to N. We want the projections of the geodesies of this 
Eisenhart metric to be the natural motions of the Hamiltonian system and therefore the 
folding trajectories of a protein in the nth state. These trajectories, parameterised by the 
time coordinate go = t, are obtained by taking ds 2 = dt 2 on the physical geodesies and 
imposing the integral condition 

q N+1 = \t + co - / [<%gV - V n ) dt (2) 

* Jo 

on the additional coordinate q N+l . Here Cq is some arbitrary real constant. 

Suppose there to be a differentiable isometric immersion / : M — > M so that for each 

p G M there exists a neighbourhood of M whose image is a submanifold of M. The 

immersion / is used to define what we mean by the (n — l)th energy landscape being a 

nested energy landscape of the nth. At p we have the decomposition 

T p M = T p M © (TpM) 1 - , (3) 

which states that the tangent vector space T p M can be decomposed into a direct sum of the 
tangent space T p M and its orthogonal complement (TpM)- 1 -. As a consequence, M inherits 
a metric and affine connection V from the Eisenhart metric of M. For X and Y vector 
fields on M extended to M it can be shown that VxY is equal to the component of Vj7 



tangential to M, where V is the affine connection on M 20] . The difference V — V uniquely 



defines the mapping H : T p M x T p M — > (T p M) ± and the shape operator Sz'- 

(S Z (X),Y) = (H(X,Y),Z) (4) 



along the direction Z G (TpM) 1 - . 

Denote the time of addition of the nth amino acid by t n , setting t n = and t n+ i = ta- 
Let 70 G M C M be the point of the immersed space M where the nth amino acid is added 
to the nascent protein. That particular folding trajectory is then no longer constrained to 
M, but continues as a geodesic 7 : [0, Ta] — > M with initial tangent vector 7 G (T^M) 1 - 
that guarantees 7 will leave M. A fundamental conjecture of the protein folding field is that 
the energy landscape is shaped such that trajectories originating from different points will 
converge to a common fold 2j-|4(. We therefore expect that a similarly constructed trajectory 
a, resulting from a delay in addition of the nth amino acid and emanating elsewhere on M, 
will converge with 7 after a given time interval. 

The distance to a from any point along 7 is measured by the Jacobi vector field J G T^M, 
which is everywhere orthogonal to the tangent vector field 7 G T^M and with suitable initial 
conditions must satisfy the Jacobi equation 

V-yV-y J = R{ff, J) 7 • (5) 

Here R is the Riemann curvature tensor on M, whose only non- vanishing components in our 
chosen coordinate chart are Roioj = didjV n . A small ||J|| implies stability along 7, whereas 



large ||J|| is indicative of chaotic behaviour 2l|; we call a point along 7 at which J vanishes 



and two geodesies converge to a common fold a focal point of M. It is a logical assumption 
that folding pathways of a good cotranslational folder will converge relatively quickly at 
each chain length to prevent fluctuations in different ta contributing to instability over 
time. Consequently, the time required for a protein domain of length n to stabilise before 
addition of the (n + l)th amino acid is roughly the interval to the first focal point of M 
along 7. 

From Proposition 10.35 in [22] it is possible to derive conditions on M and M that 
guarantee a focal point of M over (0, Ta\- The quadratic form defined by 

h^ (X) = {H(X,X),jo) (6) 

for some unit vector X G T yo M is called the second fundamental form of M at 70 along the 
direction 70 [20] . Provided h^ (X) > 1/ta and the sectional curvatures of all two-planes 
containing 7 are positive semidefinite, there is a focal point of M on 7 before addition of 
the (n + l)th amino acid. This is a powerful result, but the dependence of the conditions 



on arbitrary choices of vectors and two-planes makes it difficult to grasp the link with 
nested energy landscapes. When the dimension N is large it becomes possible to model the 
sectional curvatures along 7 as a stochastic process K,(t). Details on how fC(t) depends on 



21 



23j, however when the chain length n is small 



the potential energy V n can be found in [i 
this approximation becomes unsuitable. 

We would prefer to derive a more intuitive and exact relationship between V n , V n -\ and 
the distance to the first focal point of M. This can be achieved by introducing a new 
construction on M , but with a cost of ambiguity added to the location of the focal point. 
It is always possible to pick a hypersurface P C M through 7q orthogonal to 70 so that at 



70 the shape operator of P agrees with S^ . From Warner [24( the first focal point of any 
such P occurs at least as soon as the first focal point of M, and we therefore choose P to be 
the hypersurface whose first focal point occurs furthest along 7. By adapting the proof of 



Proposition 10.37 in 22] we obtain a set of conditions that must be satisfied if a focal point 



of M is to occur over (0, ta\- Provided 

— — - trace(S^) > — (7) 

Jy + 1 t a 

and 

Ric( 7 ,7) = AK>0 (8) 

then there can exist a focal point of M on 7 over the interval (0, r^]. However, these 
conditions do not guarantee existence absolutely. The operator Ric : T P M x T P M — > R 
appearing in [8] is the Ricci tensor on M, whose only non-vanishing components in our 
chosen coordinate chart are i?oo — AV n . 

The physical interpretations of conditions [7J and [S] follow, and are illustrated graphically 
in Fig. [Q Expression El is a generalisation of the folding funnel hypothesis and states that 
7 must pass through a subharmonic region or "trough" of the potential V n if it is to be 
a stable folding trajectory with which others converge. Trajectories passing over a saddle 
region of the energy landscape will not converge. Condition [7] is slightly more abstract as 
it depends on the form of the isometric immersion / and concept of the hypersurface P. 
Essentially, the expression on the left-hand side of UJcan be thought of as an average of the 
negative principle curvatures of M at 70, which measures how strongly (n — l)th nested 
energy landscape bends towards 70 Intuitively, the larger this quantity is, the smaller 
the time interval required for two folding trajectories emanating from M to converge. The 



curvature of M required near 70 for two geodesies to converge before addition of the (n+ l)th 
amino acid is inversely proportionally to the translation rate ta- This provides the recursion 
relation between folding at chain length n and earlier events in the translational pathway 
that dictate where the point 70 appears on M. 

By imposing an even stricter condition on the value of AV n , not necessarily to be satisfied 
by all energy landscapes, it turns out that we can again guarantee existence of the first focal 
point over the interval (0, ta\- Two points p, q G 7 are said to be conjugate if a Jacobi field 
vanishes at both p and q, and so it follows that the first focal point of M occurs at least as 
soon as the first conjugate point along 7 (for a more convincing argument see Corollary 2.3 



in Warner 24] ) . From a theorem of Myers 25J we find that if 



AV n >(N + l)C, (9) 

where ir/y/C < L 1 (L 7 being the length of 7 over [0, Ta]), then geodesies will converge before 
addition of the (n + l)th amino acid. Condition [9] alone is strong enough to ensure that any 
two trajectories of suitable length with initial tangent vectors 70 G T l0 M and 6~o G T ao M 
will converge over (0, ta]. Satisfying the condition implies the walls of the trough in V n 
containing 7 are sufficiently steep to draw in all neighbouring trajectories no matter how 
they originate from the (n — l)th nested energy landscape. 

In this letter we have provided the geometric foundations of a nested energy landscape 
theory for cotranslational protein folding. To summarise briefly, the nested energy land- 
scapes for any protein must satisfy certain geometric conditions if the nascent chain is to 
attain a stable fold on the ribosome. Provided these are met, two folding trajectories leaving 
the {n — l)th landscape at different time points to enter the nth will converge to a common 
fold before addition of the (n + l)th amino acid. The first set of conditions, involving sec- 
tional curvatures of the rath landscape and the second fundamental form of the (n — l)th, 
can be re-cast in an averaged, more intuitive version. Firstly, the region of the (n — l)th 
landscape that surrounds the point at which the nth amino acid is added must be sufficiently 
curved towards the leaving trajectory so as to direct others towards it. The curvature re- 
quired is inversely proportional to the rate of translation ta- A second requirement is that 
the folding trajectory lies within a trough rather than on a saddle of the nth landscape. 
In some cases this second condition may be strengthened to ensure convergence of any two 
trajectories by imposing a lower bound on the steepness of the trough. It is not difficult to 
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FIG. 1. Simple scheme for an intuitive grasp of conditions [7] and [HJ (A) Two trajectories leave the 
immersed one-dimensional manifold M = M. at different time points to enter the surface M. Since 
M is hyperbolic, the two trajectories will not converge. (B) In this case M is a two-dimensional 
surface immersed in M = R 3 . Trajectories leave M at different time points, but curvatures of the 
immersed surface and ambient manifold are sufficient to allow convergence of 7 and a over the 
interval [0,ta]. 

envisage the obvious extension of these results to more general scenarios involving growth 
of atomic clusters or elongation of generic polymers. 
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